Multimodal hugging Face wrapper, misc improvements, model debugging utils #396

jlamypoirier · 2025-11-27T00:15:28Z

✨ Description

Add HuggingfaceMultiModalModelForCausalLM wrapping multimodal models for hugging face, following the llava format.
Integrate the content of HuggingfaceBaseModelForCausalLM into HuggingfacePreTrainedModel and generalize to arbitrary inputs.

Rework output_hidden_states into an extensive debugging utility using the existing DebugLayer. When calling the model, one may "request" the model to output specific hidden states by providing a list of names in kwargs["output_hidden_states"] (output_hidden_states in hf wrapper). The matching hidden states (using regex) will be returned in kwargs["hidden_states"]. This is still experimental but already helped a lot with degugging. Ex:

>>> model_fast_llm(test_input, pixel_values=pixels,output_hidden_states=["vision_encoder.encoder.0.mixer", "head.logits"])
CausalLMOutputWithPast(loss=None, logits=tensor(...), past_key_values=[], hidden_states=
{'vision_encoder.encoder.0.mixer.query_rotary_input': tensor(...),  'vision_encoder.encoder.0.mixer.key_rotary_input': tensor(...), 
'vision_encoder.encoder.0.mixer.query': tensor(...), 'vision_encoder.encoder.0.mixer.key': tensor(...), 
'vision_encoder.encoder.0.mixer.value': tensor(...),  'vision_encoder.encoder.0.mixer.context': tensor(...), 
'vision_encoder.encoder.0.mixer': tensor(...), 'head.logits': tensor(...)}, attentions=None)

Replace the patch "convolution" by a simpler linear layer.
Add support for linear layers without input gradients (ex. vision embeddings)
Fix patch ordering in get_patches_from_images
Add missing causal and cross_document_attention in llava conversion.

tscholak

LGTM!

jlamypoirier added 4 commits November 26, 2025 19:14

stuff

2384a3c

fixes

df3d4bd

fix

5f6739a

fixes

87bf158

jlamypoirier marked this pull request as ready for review November 28, 2025 01:46

fixes

c4ca5f7

jlamypoirier requested review from RaymondLi0 and tscholak November 28, 2025 02:21

tscholak approved these changes Dec 1, 2025

View reviewed changes

tscholak merged commit 85cdd69 into main Dec 1, 2025
4 checks passed

tscholak deleted the jlp/vision_huggingface branch December 1, 2025 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multimodal hugging Face wrapper, misc improvements, model debugging utils #396

Multimodal hugging Face wrapper, misc improvements, model debugging utils #396

Uh oh!

jlamypoirier commented Nov 27, 2025 •

edited

Loading

Uh oh!

tscholak left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Multimodal hugging Face wrapper, misc improvements, model debugging utils #396

Multimodal hugging Face wrapper, misc improvements, model debugging utils #396

Uh oh!

Conversation

jlamypoirier commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✨ Description

Uh oh!

tscholak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jlamypoirier commented Nov 27, 2025 •

edited

Loading